Add adaptive embedding throughput shaping for Azure 429 limits by BenjaminMichaelis · Pull Request #1115 · IntelliTect/EssentialCSharp.Web

BenjaminMichaelis · 2026-05-16T05:43:39Z

Why

The previous retry-only fix still failed under sustained S0 throttling: large embedding requests kept exhausting retries at the same payload size. We need throughput shaping so rebuilds can continue progressing under rate limits instead of stalling at repeated 429 exhaustion.

What changed

Added adaptive batch downshifting in embedding rebuilds:
- starts at configured max batch size
- on 429/RateLimitReached, splits throttled batches and retries smaller sub-batches
- reuses the smaller successful size for subsequent requests in the same run
- fails clearly if batch size 1 still exhausts retries
Added explicit request pacing controls:
- AIOptions:EmbeddingRetry:MaxEmbeddingBatchSize (default 2048)
- AIOptions:EmbeddingRetry:MinInterRequestDelayMs (default 250)
- embedding requests are serialized and paced between calls to reduce sustained RPM pressure
Hardened Retry-After parsing:
- supports retry-after, retry-after-ms, x-ms-retry-after-ms
- supports extracting retry after N seconds from exception message text
Added coarse progress logging during rebuilds (not per call):
- logs start configuration
- logs progress at 10% milestones when total count is known
- falls back to every 500 chunks when total count is unknown
- includes current adaptive batch size in progress messages

Validation

dotnet build EssentialCSharp.Chat.Shared/EssentialCSharp.Chat.Common.csproj -c Release --nologo
dotnet test EssentialCSharp.Chat.Tests/EssentialCSharp.Chat.Tests.csproj -c Release --no-restore -v q

Both passed.

- Downshift embedding batch size on repeated 429s by recursively splitting batches - Reuse successful smaller batch size for subsequent requests in the same run - Fail clearly when batch size 1 still receives sustained 429 throttling - Add sequential request pacing with configurable min inter-request delay - Add configurable MaxEmbeddingBatchSize and MinInterRequestDelayMs options - Harden Retry-After parsing for retry-after, retry-after-ms, x-ms-retry-after-ms, and message hints - Update configuration comments and default appsettings values

- Log embedding rebuild start with known total (when available) - Emit progress at 10% milestones when total chunk count is known - Fall back to every 500 chunks when total is unknown - Include current adaptive batch size in progress logs

Copilot

Pull request overview

Improves resilience of Azure OpenAI embedding rebuilds under sustained throttling by introducing adaptive batch downshifting, request pacing, and more robust Retry-After handling so rebuilds can continue progressing instead of repeatedly exhausting retries.

Changes:

Added adaptive batch splitting/downshifting on 429/RateLimitReached during embedding rebuild uploads.
Serialized and paced embedding requests with a configurable minimum inter-request delay.
Hardened Retry-After parsing (more header variants + message parsing) and added coarse rebuild progress logging.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
EssentialCSharp.Web/appsettings.json	Adds default configuration values for max embedding batch size and inter-request pacing delay.
EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs	Implements pacing/serialization, adaptive batch downshifting on throttling, improved Retry-After extraction, and rebuild progress logging.
EssentialCSharp.Chat.Shared/Models/EmbeddingRetryOptions.cs	Introduces new retry/pacing configuration knobs with validation.
EssentialCSharp.Chat.Shared/Extensions/ServiceCollectionExtensions.cs	Clarifies configuration override semantics for the embedding retry options binding.

- Make embedding pacing timestamp static to match static request lock scope - Use long arithmetic in percent progress threshold comparison to avoid overflow

- Make _lastEmbeddingRequestStartedUtc instance-scoped - Keep pacing behavior unchanged for singleton DI registration

- Log request attempt state before each embedding call with batch sizing fields - Log successful batch requests using the same structured state event - Log throttled downshift transitions with old/new effective batch size context - Add end-of-run successful batch-size summary counts for production tuning

BenjaminMichaelis added 2 commits May 15, 2026 22:42

Add coarse embedding rebuild progress logging

c453ff3

- Log embedding rebuild start with known total (when available) - Emit progress at 10% milestones when total chunk count is known - Fall back to every 500 chunks when total is unknown - Include current adaptive batch size in progress logs

Copilot AI review requested due to automatic review settings May 16, 2026 05:43

Copilot started reviewing on behalf of BenjaminMichaelis May 16, 2026 05:44 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Comment thread EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs

Comment thread EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs Outdated

Fix pacing scope and progress math overflow

3938a3c

- Make embedding pacing timestamp static to match static request lock scope - Use long arithmetic in percent progress threshold comparison to avoid overflow

github-code-quality Bot found potential problems May 16, 2026

View reviewed changes

Comment thread EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs Fixed

BenjaminMichaelis added 2 commits May 15, 2026 22:58

Fix static state write warning in embedding pacing

cf1ae31

- Make _lastEmbeddingRequestStartedUtc instance-scoped - Keep pacing behavior unchanged for singleton DI registration

BenjaminMichaelis merged commit 920c021 into main May 16, 2026
8 checks passed

BenjaminMichaelis deleted the benjaminmichaelis/embedding-throughput-shaping branch May 16, 2026 06:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adaptive embedding throughput shaping for Azure 429 limits#1115

Add adaptive embedding throughput shaping for Azure 429 limits#1115
BenjaminMichaelis merged 5 commits into
mainfrom
benjaminmichaelis/embedding-throughput-shaping

BenjaminMichaelis commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BenjaminMichaelis commented May 16, 2026

Why

What changed

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants